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REVERSALS OF LEAST-SQUARES ESTIMATES AND 
MODEL-INDEPENDENT ESTIMATION FOR DIRECTIONS OF 

UNIQUE EFFECTS 

BRIAN KNAEBLE AND SETH DUTTER 


Abstract. When a linear model is adjusted to control for additional explana¬ 
tory variables the sign of a fitted coefficient may reverse. Here these reversals 
are studied using coefficients of determination. The resulting theory can be 
used to determine directions of unique effects in the presence of substantial 
model uncertainty. This process is called model-independent estimation when 
the estimates are invariant across changes to the model structure. When a sin¬ 
gle covariate is added, the reversal region can be understood geometrically as 
an elliptical cone of two nappes with an axis of symmetry relating to a best- 
possible condition for a reversal using a single coefficient of determination. 
When a set of covariates are added to a model with a single explanatory vari¬ 
able, model-independent estimation can be implemented using subject matter 
knowledge. More general theory with partial coefficients is applicable to anal¬ 
ysis of large data sets. Applications are demonstrated with dietary health 
data from the United Nations. Necessary conditions for Simpson’s paradox 
are derived. 


1. Introduction 


A multivariate statistical model may be useful for predicting values of some 
variables from values of other variables for individuals throughout a population of 
study, yet the same model may be inaccurate when used to estimate the effects 
of experimental manipulation on the same individuals. For example, standardized 
test scores of students can be predicted using information about school type, but 
effects of transferring students from one school to another may be difficult to as¬ 
certain. In general, models may suggest effects that are confounded by a set of 
lurking variables. Pearl (2009b) gives a causal definition for confounding in his 


book Causality , in contrast to definitions based on associational criteria used by 
“epidemiologists, biostatisticians, social scientists, and economists.” \Greenla nd and| 
Morgenstern| (2001) observe how in health research the term confounding has been 
used to refer to at least four distinct concepts—bias in estimating causal effects, 
noncollapsibility, inseperability of main effects and interactions, and inherent dif¬ 
ferences between variables measured and underlying constructs of interest. Here 
we use the term confounding to refer to bias in estimating causal effects. For fur¬ 


ther reading on confounding and related topics see Rosenbaum and Rubin (1983b), 


McNamee (2003), and Howards et al. (2012). 


Concerns about confounding lead to discussion of statistical adjustment. |Cox| 
(1958 Chapter 4) defines a concamitant variable through discussion of concamitant 
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observations, which are supplementary observations (on a supplementary variable) 
that may be used to increase precision (of treatment effect estimates). He describes 
how to adjust results for what “would have been obtained had it been possible to 
make the concamitant variable the same for all (individuals).” When fixed, the 
concamitant variable can not be responsible for observed variation in the outcome 
variable. Adjustment is thus a way to mimic experimental control, and through 
adjustment researchers may say that they have controlled for a confounding vari¬ 
able. Here we use the term covariate when referring to any variable that may be 
controlled for to facilitate adjustment. For more reading on adjustment methods 
that control for confounding see Lu (2009). 


Controlling for too many variables can be problematic (Chateld 1995 Hawkins 


2004), and controlling for certain types of variables can increase bias (Robins and 
Greenland |1986| | Weinberg] |1993| [Scarborough et ah] |2010| |Myers et al.[ |20lT] 


Pearl 2011). When subject matter specialists agree on the structure of a causal 


diagram it can be used to select an admissible set of covariates for adjustment 
(Pearl 2009a). McNamee (2005) gives general advice for selecting a model, sug¬ 


gesting that both subject matter knowledge and statistical information should be 
used. This was the approach taken by Davis et al. (2012) during their study of 
the effect of rice consumption on (internal) exposure to arsenic in children. They 
analyzed data, not only for “rice consumption” and “urinary arsenic concentration” 
(an indicator of recent exposure to arsenic), but also for “age”, “body mass index”, 
“water source”, and other covariates. They controlled for three different subsets 
of covariates by fitting three different multiple regression models, and within each 
model they interpreted the fitted coefficient for “rice consumption” as an adjusted 
estimate for the unique effect of rice consumption on arsenic exposure. All three 
adjusted estimates were found to be statistically significant, yet the authors con¬ 
cluded that their study only “suggests that rice consumption is a potential source 
of arsenic exposure.” The authors displayed an awareness of what Chateld (1995) 
calls model uncertainty. For additional examples of regression in the presence of 


model uncertainty see Jungert et al. (2012), Nelson et al. (2012), Cervellati et al. 


2013 

), and 

Lignell et al. 

(2013) 


We have discussed terminology and established context in order to state the 
central idea of this paper— if some aspect of an uncertain model is shown to be 
insensitive to adjustment by control for any subset of a larger set of covariates, and 
if all confounding variables are known to be within that larger set of covariates, 
then causal interpretation is more acceptable than otherwise. In this way causal 
conclusions may be obtained with a combination of subject matter knowledge and 
sensitivity analysis. We thus seek to develop useful mathematics that facilitates 
such sensitivity analysis. We adopt the general context of linear regression, and we 
assume the principle of least squares. Our objective is to identify simple conditions 
that can be used to ensure that estimates for directions of unique effects are in¬ 
variant across many different model extensions. The general process of using these 
conditions for the purpose of estimation is called model-independent estimation , 
and the mathematics associated with directions of effects is referred to as analysis 
of reversals. The main results are presented in Section [2] Proofs are in Section [3] 
Necessary conditions for Simpson’s paradox are derived in Section [4.1| Applications 
are demonstrated in Section 14.21 Further discussion occurs in Section [5] 
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2. Results 


Let y denote a matrix with a single column of response data associated with the 
response variable Y. Let x denote a matrix with a single column of explanatory 
data associated with the explanatory variable X. Let w = [wi,...,w p ] denote a 
matrix with p columns of covariate data, associated with covariates Wt ..... W p . 
Let u = [ui, ...,Ufc] denote a matrix with k additional columns of covariate data, 
associated with covariates U \,..., [/*,. Let e denote a matrix with a single column of 
ones. All matrices have n rows, and each row of [yxwu] represents a multivariate 
observation on a single individual. We refer to the columns of [eyxwu] as vectors 
and assume that each subset of vectors is linearly independent and non orthogonal. 

The mathematics herein requires notation capable of representing coefficients 
across multiple models. Let m = [m 2 ...m/] denote a generic matrix with l — 1 
columns and n rows, and let mi and z denote generic vectors each with n entries. 
When z is regressed onto [e m] we write i? 2 (m, z) for the coefficient of determination 
and i?(m, z) for its positive square root. For any j we write r(mj,z) for the 
correlation between nx, and z. We write Z| m for the residual vector z — z(m), 
where z(m) is the vector of fitted values. A hat vector with a null argument is 
interpreted as the zero vector. In place of [ui| w ... Ufc| w ] we write U| w . When z is 
regressed onto [emim] we write /? mi | m (z) for the least squares fitted coefficient 
of mi. Similar notation is used for other least-squares coefficients. If m naturally 
decomposes column-wise we may then express m as a set of components separated 
by commas. 


Proposition 2.1. A reversal, 

sign(/3 x | w , u (y)) ± sign(/3 x | w (y)), 

occurs if and only if 

R(u| w , X| w )I?(u| w , y| w )r(5cf^(u), yf^(u)) 


( 2 . 1 ) 


r(x| w ,y|w) 


> 1. 


We refer to r(x| w ,y| w ) as the partial correlation between x and y given w, and 
we denote it with r xy | w . Likewise, we write R u x | w for R(u| w ,X| w ) and R u y | w for 
R( u l w , y|w)• Since |r| < 1 and additional explanatory columns can not decrease R, 
we have the following corollaries. 

Corollary 2.1. Lets be any subset of {u!,..., u*,}. Then 


^U,x|w A Uiy | 


'u.xlw-^u.ylw ^ Px,y|w 


Definition 2.1. 


v(x,y;w) = 


sign(/3 x | w , s (y)) = sign(/3 x | w (y)). 

x |w y|w 


I x i w I |y|w| 

Definition 2.2. 

r* = |2r Xiy | w /(r x , y | w + l)| 

Corollary 2.2. Let s be any subset of {u 1; ..., u*.}. Then 


sign(/3 x i w s (y)) = sign(/3 x i w (y)). 


-R 2 (u| w ,v) < r* = 

The conclusions of both corollaries are identical. Proposition |2.1| is logically 
stronger than Corollary |2.1| and Corollary |2.1| is logically stronger than Corollary 


2.2 The condition within Corollary 2.2 is best possible (see Remark 3.1) based 
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on a single coefficient of determination for our desired conclusion. The conclusion 
makes model-independent estimation possible for the direction of an effect. All 
2 fc subsets of {ui,...,Ufc} are handled simultaneously. When k is large, model- 


independent estimation can complement Bayesian model averaging (see Hoeting 


et al. (1999)). When w = 0, model-independent estimation can be implemented 
using only subject matter knowledge regarding hypothetical u. This is because 
coefficients of determination are intuitive. Intuition for reversals of least-squares 
estimates and intuition relating to general adjustment of regression models can be 
improved through study of the imagery in Figure [T] When u refers to a single 
vector then |r(xf^(u),yf^(u))| = 1 , and the condition -R u , x |w-Ru,y|w > l r x, y |w| is 
necessary and sufficient for a reversal. Within the space orthogonal to the columns 
of [ew] the reversal region for U| w is an ellipsoidal cone of two nappes (see (3.12|), 
with axis of symmetry along v and boundary vectors having coefficients of deter- 
mination greater than or equal to r*. 



Figure 1. A vector u has induced a reversal, sign(/3 x | w u (y)) ^ 
sign(/3 x | w (y)), if and only if within the span of {x| w , y| w , U| w } we 
have U| w or —U| w positioned within the red, elliptical cone. The 
blue, spherical cone relates to Corollary |2.2[ and the square of the 
correlation between either purple vector and v is r*. 
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3. Proofs 


Corollary |2.1| is a ready consequence of Proposition |2.1| It thus remains to 
prove Proposition |2 .1 1 and Corollary |2.2| Note how with _L indicating orthogonality 
between sets of vectors we have 

(3-1) {e,wi,...,w p } _L {x| w ,y| w ,ui| w ,...,u fc | w }, 

and therefore 

(3.2) /3x|w.u(y) = /3x |w |u |w (y|w) and /3 x | w (y) = /3 X|w (y|w)- 

Let x stand for X| w , y stand for yj w , and u = [ui ■ ■ ■ Uk] stand for U| w . To prove 
Proposition 2.1 it thus suffices to demonstrate 

sign c^) ¥> sign(&? y) <=► V Z V '"l hyK > 1. 

r{x,y) 

Since {a:, y , u±, ..., Uk} -L {e}, each element of the set {if, y , ui, ..., Uk} is a centered 
(mean zero) vector. We can assume also that each vector is unit length and that 
{u 1: ...,u k } is an orthonormal subset. When y is regressed onto [xu] the vector of 
fitted coefficients is 

P = [Px\u(y)/3u 1 \s,u2,...,ui c {y) ■■■ Puk\x,ui,...,uk-i(y)] ■ 

With A = [x u\ ■ ■ • Uk] the normal equations are 

(A*A) p = A l y. 

Set B = [yui ■ ■ ■ Uk ]■ Replacing the first column of A* A with A l y produces the 
matrix A 1 B, and by Cramer’s rule 

det(A t A )' 

Because u is orthonormal, 


(3.3) 


A l B = 


(x,ui) 

1 


and 


A*A = 


The determinants from |3.3 
the result is 


(x,V) 

(til ,y) 

{tik,y ) o 

(x,x) (x,Ui) 
(tii,x) 1 


(x,ti k ) 

0 


(x, ti k ) 

0 


(tik,x) 


0 


1 


can thus be evaluated using the Leibniz formula, and 


(3.4) Ps\u(y) — 


(x,y)~ Ei=i (x,tii)(tii,y) _ (x,y) - (x(u),y(u)) 


(x,x) - J2i=i(ti,tii)(ui,x) (x,x) - (x(u),x(u)) 

With centered, unit-length data, we have $s(y) = (x,y) = r(x, y ), R(u, x) = 
|x(m)|, and R(u,y) = \y(ti)\. These observations allow us to manipulate (3.4). 
After multiplying by a := ( (x,x) — (x{u),x(u)) ) the result is 


aPs\u(y) = (x,y) - (x(u),y(u)) 


( 3 . 5 ) 
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a 0s\u{y) _ {x(u),y{u}) 


Px(y) 

fix\u{y) 

Px(y) 

fix\u(y) 


= l - 


= l - 


(3.6) 

/W) 

Because a > 0, we see from (|3.6|) that 


r(f, y) 

\x(u)\\y{u)\ (x{u),y(u)) 
r(x,y) \x(u)\\y(u)\ 
R{u, x)R(u, y)r(x(u), y(u)) 


sign(/3j| a (y)) ^ sign 0 s (y}) 


r(x,y) 

R(u, x)R(u, y)r(x(u),y(u)) 


> 1. 


r(x,y) 

This completes the proof of Proposition |2.1[ 

To demonstrate the truth of Corollary |2.2| we remain in the same context. Each 
of the vectors in the set {x, y, Hi, ...,Ufc} is centered and unit length, and we now 
additionally consider v = x + y. Note that v is equal to v from Definition |2.1| We 
assume 0s{y) > 0. This can be assumed without loss of generality by replacing x 
with —x if necessary. We show 

(3.7) R 2 (u,v) < 2r(x,y)/(l + r(x,y)) => 0 S \ a (y) > 0. 

The condition for the implication within (3.71 can be written as 

2(f,y)/(l + (x,y)) > \v(u)\ 2 /\v\ 2 
2(x,y)/(l + (x,y)) > |f?(«)| 2 /(2(l + (x,M 
4 (x,y) > |t7(w)| 2 
\v\ 2 + 2((x,y)-l)>W(u)\ 2 
- R^)! 2 ) + (x,y) - l > o 


(3.8) 
1 


2 (V| 2 - R«)| 2 ) +{x,y)~ ((x,y)~ (x{u),y(u))) - 1 > -{(x,y}~ (x(u),y{u))). 


Since \x(u)\\y(u)\ > (x(u),y(u)) = (x,y) - {{x,y) - (x(u),y(u))) we have 
\ (l^l 2 - RR| 2 ) + \x(u)\\y{u)\ - 1 > -«£,$ - (x(u),y(u))), 


and via Jensen’s inequality |(|x(m )| 2 + \y(u)\ 2 ) > |(|x(m)| + |y(u)|) 2 > |x(u)||y(?I)|. 
Therefore, 


( 3 . 9 ) \{H 


v\ 2 - \v{u)\ 2 ) + -(\x(u)\ 2 + \y{u)\ 2 ) - 1 > -({x,y) - {x{u),y(u))). 


Remark 3.1. (3.81 and (3.9) are logically equivalent if and only if x(u) = y[u). 


- 1. 


Completing the square gives 

(3.10) ({x,y)~ {x(u),y{u))) = ^ (V| 2 - Ru)| 2 ) + ^(| x(u )\ 2 + \y{u)\ 2 ) 

Substitution of ((x, y) — (x(u),y(u))) for | ^|x| 2 — |x(m)| 2 ^ + l(|x(it)| 2 + \y(u)\ 2 ) — 1 
within (3.10) thus leads to 


(x,y) - (x(u),y(u)) > 0, 
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which by (3.5) is the desired conclusion of (3.7). This completes the proof of 

Corolla ry [2^| 

By (3.1) and (3.2), when k = 1, the reversal region for u consists of those 
points within the column space that project onto a region, V(x,y), within the 
space that is orthogonal to the columns of [ew]. To see how V is an ellipsoidal 
cone, set r = |r(x, y)\, scale x and y (perhaps negatively), and select orthonormal 
coordinates for that space of m = n — p — 1 dimensions so that 


x = - 


1 — r 


1 + r 


0,..., 0 and y = 


1 — r 


1 + r 


0,...,0 


Let u = (iti,..., Um) be a variable vector in that same space. We have 
x{u) = —u\ 


1 — r 


U2 


I v~v2 and y( u ) = \ u i 


1 — r 


u 2 


1 + r 


12 ■ 


Therefore, via (3.5), = 0 if and only if 


2r = 2 [{x{u),y(u 
2r = 2 ( — Ui 


1 — r 


u 2 


1 + r 


u 1 


1 — r 


+ U 2 


1 +r 


2r = 2 l u 


:(1 


1 


2 1 2 
(3.11) 2r\u\ 2 = (ul(l + r) - ul(l - r)) . 


With U 2 = 1, line (3.11) can be written as 


2 r{u\ + 1 + u§ + ... + u 2 m ) = (1 + r) + (r - 1 )u\ 
(r + 1 )u\ + 2 r{u\ + ... + u 2 m ) = 1 - r 


(3.12) 


1 + r 
1 — r 


u\ 


2 r 

1 — r 


( u 3 + + U m ) — 1 - 


Since scaling of u does not affect $s,y\ui f de zer0 se f { u '■ Ps,y\u = 0} is conical, of 
two nappes, with ellipsoidal cross-sections. The cross sections are approximately 
spherical for large values of r. 


4. Applications 

Analysis of reversals has produced the mathematical results of Section [2j These 
results can be used when the direction of a unique effect is of interest, as could be the 
case during study of the safety of a medical intervention for example. These results 
are generally capable of handling continuous or categorical data, and they lead to 
necessary conditions for Simpson’s paradox. The results are meant mainly for use 
during sensitivity analysis. Sensitivity of bivariate correlation coefficients can be 
assessed even in the absence of covariate data, since r and R 2 values are readily 
estimated using only subject matter knowledge. Sensitivity of multiple regression 
coefficients may be assessed in a similar manner, although using subject matter 
knowledge to estimate partial coefficients may be difficult. The general formulation 
of results within Section [2] is meant for application during analysis of large data 
sets. With a large number of covariates it may not be computationally feasible 
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to fit every possible model extension, yet computation of partial coefficients along 
with Proposition |2.1| can make model-independent estimation possible. Model- 
independent estimation is demonstrated in Section |4.2| with dietary health data 


from the United Nations. Section 4.1 shows how an occurrence of Simpson’s paradox 
implies the reversal of a least-squares estimate, but not vice versa. 


4.1. Simpson’s paradox. 


Definition 4.1. Simpson’s paradox is the designation for a surprising situation 
that may occur when two populations are compared with respect to the incidence of 
some attribute: if the populations are separated in parallel into a set of descriptive 
categories, the population with higher overall incidence may yet exhibit a lower 


incidence within each such category (Wagner 1982). 


For examples of Simpson’s paradox see Section 2 or Section 3 of Wagner’s article 


or the examples section of Julious and Mullee (1994). Another well known example 


occurred when the University of California at Berkeley was sued for gender bias. 
Overall, female graduate school applicants were being admitted at a lower rate 
than males, but within most departments (where autonomous decisions were being 
made) females were being admitted at higher rates than males. The bias did not 
reverse within every department, yet the authors still chose to describe the situation 
as “a paradox, sometimes referred to as Simpson’s” (Bickel et al. 19751. Similar 


terminology has been used by Appleton et al. (1996). Common to these examples 


is a reversal of the purported effect of population on incidence. We thus propose a 
weaker definition using the terminology of least-squares regression. 


Definition 4.2. Let x indicate population, y indicate incidence, and u = [ui... Ufc] 
indicate category. We say that a reversal of the effect of x on y has occurred if 

sign(^ x | u (y)) ^ sign(/3 x (y)). 

Lemma 4.1. If Simpson’s Paradox has occurred then a Reversal has occurred. 

Proof. There are k + 1 categories. Let j be an element of {0,1,..., k}. Let 0 x (j) 
represent the least-squares slope coefficient for x when y is regressed onto x over 
only those data in category j. For every j we assume that 0 x (j) > 0. 

Given a regression of y onto [xeui ... u^,] we have least-squares fitted coefficients 
{0x.,0o,0i , The sum of the squares is a function of (/3 X , /?o, 0i, —0k)> and 

(j3 x , $o, 0i, ...0k) is the minimizer. For i e {0,1} let y(j,i) denote the mean of 
those observations in category j with X = i. 

Observe how for every j we have 0j < y(j, 1). Thus we consider only (/? 0 , 0i, ■■■0k) 
such that 6j < y(j , 1) for each j. Note how for any such tuple with a > 0 that the 
sum of the squares when /3 X = a is less than the sum of the squares when /3 X = —a. 
Therefore /3 X > 0. □ 


The preceding proof shows how the criterion for Simpson’s paradox is strictly 
stronger than the criterion for a reversal. The corollaries of Section [2] can thus be 
modified into theorems for Simpson’s paradox. Let x indicate population, let y 
indicate attribute presence, and let u indicate category. Let P be a partition of 
{ui, ..., Ufc} into q cells, where 1 < q < k. Let t be a matrix with columns indicating 
cell membership. Note that R 2 for t is less than or equal to R 2 for u. Simpson’s 
paradox is with respect to t. Coefficients of determination for sets of indicator 
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variables are well defined as long as the same non-zero quantity is used to indicate 
membership for all individuals within a specific category. Generally, the coefficient 
of determination can be defined as a geometric property of linear subspaces, and 
thus it is invariant under change of basis. 

Theorem 4.1 (Strong, Necessary Condition for Simpson’s paradox). Simpson’s 
paradox can not occur unless 

R(u,x)R(u,y) > ]r(x,y)|. 

Theorem 4.2 (Weak, Necessary Condition for Simpson’s paradox). Simpson’s 
paradox can not occur unless 

R 2 ( u, v) > r*. 


Necessary conditions for reversals of least-squares estimates are necessary condi¬ 
tions for Simpson’s paradox, but these conditions are not adequate for other vari¬ 
eties of ecological fallacy (see Piantadosi et al. (1988)). This distinction is relevant 
throughout the next subsection, where we analyze country-level effects. Analysis 
of reversals can be used to determine whether or not these country-level effects are 
due to categorization into continents, as might be suggested if confounding due 
to ethnicity or genetic makeup is suspected, but further assumptions would be re¬ 
quired in order to pass to continent-level or individual-level results. Our focus here 
is not on multilevel analysis nor traditional inference but rather a technique that 
adjusts for an indeterminate set of covariates. 


4.2. Model-independent estimation. In this subsection we demonstrate method¬ 
ology with data that was recorded in 2008 and 2009 by the United Nations. The 
data was obtained in 2013 from three different sources: the World Health Organi¬ 
zation (WHO), the Human Development Report Office (HDRO), and the Food and 
Agriculture Organization (FAO). For each of 155 countries, age-adjusted, mean, 
total cholesterol levels (WHO, 2008) and Human Development Index (HDI) scores 


(HDRO. 2009) were retrieved, along with per capita consumpt ion rates for meat, 


milk, eggs (FAO 2009), fish, and animal fats (FAOSTAT 2013) 

HDI is an index that measures the state of human development within a country, 
utilizing indicators relating to life expectancy, educational attainment, and income 
per capita. Among the variables just mentioned, HDI correlates most strongly 
with cholesterol levels, with a correlation coefficient of approximately r = 0.91. 
(Henceforth we round all estimates to the nearest hundredth.) A bivariate plot of 
HDI and cholesterol data is shown in Figure [2j Analysis of reversals leads to the 
belief that such a strong correlation is unlikely to be reversed by controlling for 
covariates. 

Meat consumption is measured in kg/person/year and includes consumption of 
pig, poultry, cattle, and sheep. The observed correlation between meat consumption 
and cholesterol is 0.81, while the observed correlation between meat consumption 
and HDI is 0.82. These numbers, while impressive, are not strong enough to induce 
a reversal. The magnitude of their product, 0.66, is less than that required for a 
reversal, r = 0.91, and therefore by Corollary 2.1 we can be sure that the direction 


U'hese data were retrospectively selected for instructive demonstrations of model-independent 
estimation. Variables were chosen for pedagogical reasons unrelated to scientific study of choles¬ 
terol, and causal conclusions are not intended nor implied by these demonstrations. 
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Strong correlation 



Figure 2. A scatter plot (n=155) showing a strong correlation 
(r ss 0.91) between development and mean total cholesterol levels 
at the country level 


of the estimate for the effect of HDI on cholesterol is not sensitive to adjustment 
by control for meat consumption. 

The actual fitted linear model of cholesterol in terms of HDI and meat consump¬ 
tion gives more information. When fit over standardized data, so as to allow for 
comparison across differing units, HDI remains the dominant explanatory variable. 
Its fitted coefficient is 0.026, with an associated t statistic of 13.8 (p ~ 10” 15 ), 
while the fitted coefficient for meat is 0.006, with a t statistic of 3.0 (p ~ 0.004). 
The retained importance of HDI is visually evident in Figure [3] At nearly all levels 
of meat consumption the estimate for the effect of increasing HDI on cholesterol 
remains strongly positive. 

Next we adjust for meat, milk, eggs, fish, and animal fat, simultaneously. A large 
model of cholesterol in terms of HDI and all these dietary variables is summarized in 
Table 4.1 HDI remains highly significant (t = 8.49 ,p ~ 10~ 13 ), and its dominance 
is not unexpected. We know from Corollary |2.2| that for a reversal to occur the 
dietary variables’ coefficient of determination for v (the standardized sum of HDI 
and cholesterol) must be larger than r* = 2r/(r + l) = 0.96, and calculation reveals 
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Color coded cholesterol levels 
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Figure 3. A plot (n=155) showing the trivariate relationship be¬ 
tween color-coded, mean, total cholesterol levels, development, and 
meat consumption, at the country level: conditional on meat con¬ 
sumption the relationship between development and cholesterol 
appears linear and the estimate for the effect of development on 
cholesterol remains strongly positive; conditional on development 
the estimate for the effect of meat consumption on cholesterol is 
much smaller in magnitude. 

that this coefficient is only 0.84. Therefore, adjustment for any subset of the dietary 
variables can not induce a reversal. This conclusion could conceivably have been 
reached even in the absence of data, since coefficients of determination can be 
estimated with subject matter knowledge. 

It is more difficult to estimate a partial coefficient of determination with subject 
matter knowledge. Suppose that subject matter knowledge has lead to a dietary 
model of cholesterol in terms of only meat, milk, eggs, fish, and animal fat. This 
model is summarized in Table |4~2| Note the final column, where we have included 
absolute values of partial correlation coefficients. These coefficients are computed 
as partial correlations between a given row’s variable and cholesterol, given the 
remaining dietary data. Calculation with residual vectors reveals, using either 
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Table 4.1. A linear model of cholesterol fit to standardized, 
country-level data. HDI is the dominant explanatory variable, 
even when the five dietary variables are combined into one vari¬ 
able, namely the vector of fitted values from the dietary model of 
Table gj 


explanatory variable 

fitted slope coefficient 

t statistic 

two-sided p value 

HDI 

0.58 

8.49 

« 10" 13 

meat 

0.11 

1.95 

0.05 

milk 

0.08 

1.50 

0.14 

eggs 

0.12 

2.54 

0.01 

fish 

0.07 

2.13 

0.03 

animal fat 

0.10 

2.34 

0.02 


Table 4.2. A linear model of cholesterol that has not been ad¬ 
justed for HDI. Partial correlations have been computed between 
a given row’s variable and cholesterol, given the remaining dietary 
variables. Higher partial correlations indicate stability. 


variable 

slope coefficient 

t statistic 

p value 

partial correlation 

meat 

0.35 

5.93 

~ 10 -7 

0.44 

milk 

0.22 

3.70 

0.0003 

0.29 

eggs 

0.32 

6.19 

W 10" 8 

0.28 

fish 

0.14 

3.59 

0.0005 

0.45 

animal fat 

0.09 

1.77 

0.0786 

0.14 


Corollary |2.1| or Corollary |2.2| that HDI is not capable of inducing any reversals. 
With k covariates similar calculation would be done for the whole set of covariates 
at once. 


5. Discussion 


Proposition |2.1| and its corollaries have been designed for use during analysis of 
large data sets, especially when the goal is to estimate the direction of a causal effect 
of X on Y by adjusting for covariates. Suppose a model of y has been fit to x and 
w, and confounding by some indeterminate subset s C u is suspected. There are 2 k 
subsets to consider, each associated with a particular model extension, and it may 
not be feasible to fit all possible models. However, by fitting a single model of v in 
terms of u, if the R 2 value is small compared to r*, then the technique of model- 
independent estimation can be implemented. That is the content of Corollary |2.2[ 
and Corollary |2.1| is similar. 

Related theory exists within the field of econometrics. Here we have dealt with 
model extensions, while econometricians have already dealt with model contrac¬ 
tions. They have studied reversals by assuming a larger model and an effect of 
interest, along with conditions on a set of variables to be removed. Using t and F 


statistics, Learner (1975) showed how reversals can only occur if the set of variables 


to be dropped is more significant than the variable of interest. Visco (1978) showed 
that this condition is not sufficient, and he also derived necessary and sufficient 
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conditions for a reversal when only a single variable is dropped. Oksanen (1987) 


rephrased the conditions using partial correlation. McAleer et al. (|1986 ) and Giles 


(1989) presented generalizations. However, using the words of Imbens ( 2003| p 126) 
“One is not interested in what would have happened in the absence of covariates 
actually observed, but in biases that are the result from not observing all relevant 
covariates.” 

For example, consider smoking and lung cancer. A simple causal graph is in¬ 
adequate, because of complicated relationships between smoking, lung cancer, and 
confounding variables ( Pearl[ 2009b, p 424). For instance, the US Environmental 
Protection Agency (EPA) lists (indoor exposure to) radon (gas) as the second lead¬ 
ing cause of lung cancer in the United States (EPA, 2013), and there is evidence 


of interaction between radon gas and smoke (Beir 1999 Appendix C, p 239) . It 


wasn’t a perfect model, but rather an inequality (|Cornfield et ah, 2009, Appendix 


A) that played a critical role in allowing the US Surgeon General to conclude that 
cigarette smoking is causally related to lung cancer in man (Lin et al., 1998). In 


response to Fisher’s constitution hypothesis (Fisher 1958), Cornfield et al. stated 
that “the magnitude of the excess lung-cancer risk among cigarette smokers is so 
great that the results can not be interpreted as arising from an indirect association 
of cigarette smoking with some other agent or characteristic, since this hypothetical 
agent would have to be at least as strongly associated with lung cancer as cigarette 
use; no such agent has been found or suggested.” 

A limitation of reversal analysis is its emphasis on direction rather than magni¬ 
tude. There is much literature dealing more exactly with omitted variable bias. It 
can be specified as a complicated matrix expression (Seber and Lee 2003 Chaper 


3). It can be factored into a ratio of standard errors, an F statistic, and a partial co¬ 
efficient of determination (Hosman et al. 2010). Expressions bounding the t values 
of the larger model can be written in terms of coefficients of determination, under 
certain assumptions (Frank 2000). Assuming binary treatment, sensitivity can be 
assessed with distributional assumptions for the confounding variables, along with 


knowledge of how they affect the response (Lin et al. 1998). See also Rosenbaum 


and Rubin (1983a). In general, more exact results require more detailed assump¬ 
tions. There are few assumptions underlying the analysis of reversals. Precision 
has been traded for the possibility of model-independent estimation. 

Analysis of reversals has produced necessary conditions for Simpson’s paradox, 
revealed geometric symmetry within the column space of data sets, and lead to 
the possibility of model-independent estimation—a technique for identifying effects 
that are invariant across a class of models. To determine the direction of an effect, 
either Corollary |2.1| or Corollary |2 . 2| can be applied, and only bas ic knowledge of r 
and R 2 is required. Note that r alone is not sufficient. Table 5.1 gives an example 
where Ui and U 2 both correlate arbitrarily weakly with both x and y, yet [ui 112 ] 
induces a reversal. Also, partial coefficients are required. Table pT~2] gives a related 
example where a single vector u is not correlated with x nor y, yet it induces 
a reversal nonetheless, by activating a previously dormant w. Finally, even with 
w = 0 it is not possible to conduct model-independent estimation while retaining 
7-(xf^(u), ypO(u)) in its entirety for possibly stronger logical reasoning. Table 5.3 


gives an example where such theory would suggest (correctly) that a reversal is not 
possible due to [ui 112 ], but both Ui and U 2 individually lead to reversals. 
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Table 5.1. A counterexample showing the need for R 2 : $ x (y) = 
r(x, y) « 0.5, and as e l 0, r(ui,x) = r(ui,y) l 0 and r(u 2 ,x) = 
r ( u 2 , y) 4-0, while f?(u,y)i?(u,y) « 0.75 and /3 x | u (y) = -1- 


y 

X 

Ul 

u 2 

(x/2 + 3)/2 

(- V 2 + 3)/2 

e/V2 

e/v^ 

(\/2-3)/2 

(— V2 — 3)/2 

-e/y/2 

- e /\/2 

-1/2 

1/2 

1 

-1 

— 1/2 

1/2 

-1 

1 


Table 5.2. A counterexample showing the need for partial co¬ 
efficients: as S l 0, /3 x | w (y) « 0.5, r(u,x) = r(u,y) = 0, yet 
/3 x |w,u(y) ~ -0.4, while r(w,x) = r(w,y) | 0. 


y 

X 

w 

u 

(72 + 3)/2 

(—x/2 + 3)/2 

5/V2 

0 

(\/2 — 3)/2 

(—\/2 — 3)/2 

-d/s/2 

0 

-1/2 

1/2 

1 

-l 

-1/2 

1/2 

-1 

l 


Table 5.3. A counterexample showing how model-independent 
estimation is not possible with full use of 7'(xj^(u), yj^(u)) and 
Proposition 2.1 /3 X (y) « 0.5, and for small, positive e and S , 

as ( e,S ) -5- (0,0), /3 x | Ul ,u 2 (y) -> 1-0, /3 x | Ul (y) -> -1-0, and 
/3 x |u 2 (y) -t -1-0. 


y 

X 

Ul 

u 2 

(x/2 + 3)/2 

(-V2 + 3)/2 

(e + 3v/2)/2 

(—e + 3-\/2) / 2 

(x/2 - 3)/2 

(—x/2 — 3)/2 

(e-3v/2)/2 

(~e-3V2)/2 

-1/2 

1/2 

(—e + 5\/2)/2 

(e + J'n/ 2)/2 

-1/2 

1/2 

(—e - <V2)/2 

(e - <5v / 2)/2 
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